One of the primary modules in py-goldsberry
is the player
module. It provides access to a multitude of player-level statistics.
Each class in the player
module requires a specific playerID. If you have looked through the first tutorial, you can see that py-goldsberry
has a built-in function that makes it easy to find the playerIDs for a given season.
In [1]:
import goldsberry
import pandas as pd
goldsberry.__version__
Out[1]:
One of the many things you can do with py-goldsberry
is generate a list of game logs for a single player or the entire league (depending on what you desire). This can be accomplished very easily using two built-in methods and a simple custom function.
First, we generate a list of players from the current season using the built-in PlayerList()
function, and convert it to a Pandas DataFrame.
In [2]:
players = goldsberry.PlayerList()
players2015 = pd.DataFrame(players.players())
players2015.head()
Out[2]:
When you have the data into a DataFrame, you can take advantage of the Pandas functionality to search for specific players, teams, rookie cohorts, etc...
Let's start by looking for just James Harden.
In [3]:
players2015.ix[players2015['DISPLAY_LAST_COMMA_FIRST'].str.contains("Harden")]
Out[3]:
Fortunately, there is only one player with Harden
somewhere in his name. If we had searched for James
, it would have been a bit of a different story.
Because we want to get information on James Harden, we need to make note of the value in his PERSON_ID column. This is the unique id number that is associated with Harden in the NBA database. Anytime we want to search for James Harden related information, this will be a value to remember.
To make it easy to remember, I'm going to save it as a variable in our environment that we can call it anytime we want. It's a bit easier for me to remember harden_id
than 201935
.
In [4]:
harden_id = '201935'
One of many pieces of available data for a player is their game logs. You can access these by using the goldsberry.player.game_logs()
class and passing in the playerID.
There are a few variables that can be manipulated in the game_logs to adjust the data that gets returned. The most important is the season
argument. When you instantiate the class, you must pass a valid player id. When the class loads, it automatically grabs all of the game logs for the player for the current season.
In [5]:
harden_game_logs = goldsberry.player.game_logs(harden_id)
Now that we've collected the data from the NBA website, we want to create a Pandas DataFrame to view an analyze.
In [6]:
harden_game_logs_2015 = pd.DataFrame(harden_game_logs.logs())
Notice that we passed harden_game_logs.logs()
and not harden_game_logs
to the DataFrame constructor. This is because, with many of the calls in py-Goldsberry
, there are multiple sets of data returned. Instead of making multiple calls to the NBA's server, a single call is made and all of the data is store in the class. The various methods of the class provide access to the raw data.
(Until documentation is complete, take advantage of the [TAB] complete feature in jupyter.)
In [7]:
harden_game_logs_2015.head()
Out[7]:
If you've found this helpful and/or have any other requests, shoot me an email bradley@cardinaladvising.com or post an issue on github